A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval
نویسندگان
چکیده
MOTIVATION The performance of classifiers is often assessed using Receiver Operating Characteristic ROC [or (AC) accumulation curve or enrichment curve] curves and the corresponding areas under the curves (AUCs). However, in many fundamental problems ranging from information retrieval to drug discovery, only the very top of the ranked list of predictions is of any interest and ROCs and AUCs are not very useful. New metrics, visualizations and optimization tools are needed to address this 'early retrieval' problem. RESULTS To address the early retrieval problem, we develop the general concentrated ROC (CROC) framework. In this framework, any relevant portion of the ROC (or AC) curve is magnified smoothly by an appropriate continuous transformation of the coordinates with a corresponding magnification factor. Appropriate families of magnification functions confined to the unit square are derived and their properties are analyzed together with the resulting CROC curves. The area under the CROC curve (AUC[CROC]) can be used to assess early retrieval. The general framework is demonstrated on a drug discovery problem and used to discriminate more accurately the early retrieval performance of five different predictors. From this framework, we propose a novel metric and visualization-the CROC(exp), an exponential transform of the ROC curve-as an alternative to other methods. The CROC(exp) provides a principled, flexible and effective way for measuring and visualizing early retrieval performance with excellent statistical power. Corresponding methods for optimizing early retrieval are also described in the Appendix. AVAILABILITY Datasets are publicly available. Python code and command-line utilities implementing CROC curves and metrics are available at http://pypi.python.org/pypi/CROC/ CONTACT: [email protected]
منابع مشابه
CROC: A New Evaluation Criterion for Recommender Systems
Evaluation of a recommender system algorithm is a challenging task due to the many possible scenarios in which such systems may be deployed. We have designed a new performance plot called the CROC curve with an associated statistic: the area under the curve. Our CROC curve supplements the widely used ROC curve in recommender system evaluation by discovering performance characteristics that stan...
متن کاملBridging the Gap Between Neural Network and Kernel Methods: Applications to Drug Discovery
We develop a hybrid machine learning architecture, the Influence Relevance Voter (IRV), where an initial geometryor kernelbased step is followed by a feature-based step to derive the final prediction. While other implementations of the general idea are possible, we use a k-Nearest-Neighbor approach to implement the first step, and a Neural Network approach to implement the second step for a cla...
متن کاملCollapsing ROC approach for risk prediction research on both common and rare variants
Risk prediction that capitalizes on emerging genetic findings holds great promise for improving public health and clinical care. However, recent risk prediction research has shown that predictive tests formed on existing common genetic loci, including those from genome-wide association studies, have lacked sufficient accuracy for clinical use. Because most rare variants on the genome have not y...
متن کاملOptimizing Area Under the ROC Curve using Ranking SVMs
Area Under the ROC Curve (AUC), often used for comparing classifiers, is a widely accepted performance measure for ranking instances. Many researches have studied optimization of AUC, usually via optimizing some approximation of a ranking function. Ranking SVMs are among the better performers but their usage in the literature is typically limited to learning a total ranking from partial ranking...
متن کاملThe Drosophila fork head domain protein crocodile is required for the establishment of head structures.
The fork head (fkh) domain defines the DNA-binding region of a family of transcription factors which has been implicated in regulating cell fate decisions across species lines. We have cloned and molecularly characterized the crocodile (croc) gene which encodes a new family member from Drosophila. croc is expressed in the head anlagen of the blastoderm embryo under the control of the anterior, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 26 10 شماره
صفحات -
تاریخ انتشار 2010